Design of a Digital Library for Early 20th Century Medico-legal Documents

نویسندگان

  • George R. Thoma
  • Song Mao
  • Dharitri Misra
  • John Rees
چکیده

The research value of important government documents to historians of medicine and law is enhanced by a digital library of such a collection being designed at the U.S. National Library of Medicine. This paper presents work toward the design of a system for preservation and access of this material, focusing mainly on the automated extraction of descriptive metadata needed for future access. Since manual entry of these metadata for thousands of documents is unaffordable, automation is required. Successful metadata extraction relies on accurate classification of key textlines in the document. Methods are described for the optimal scanning alternatives leading to high OCR conversion performance, and a combination of a Support Vector Machine (SVM) and Hidden Markov Model (HMM) for the classification of textlines and metadata extraction. Experimental results from our initial research toward an optimal textline classifier and metadata extractor are given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of a Digital Library for Early 20 Century Medico-legal Documents

The research value of important government documents to historians of medicine and law is enhanced by a digital library of such a collection being designed at the U.S. National Library of Medicine. This paper presents work toward the design of a system for preservation and access of this material, focusing mainly on the automated extraction of descriptive metadata needed for future access. Sinc...

متن کامل

[The influence of infectious diseases on ancient maritime navigation and the earliest attempts to control them through codes].

This article describes the influence of infectious diseases on ancient maritime navigation and the early attempts to prevent their spread with legal regulations. In ancient times, the greatest health hazard for sailors were poor hygienic conditions, water supply, nutrition, accommodation, air, and lighting on board. These conditions favoured the development and transmission of infectious diseas...

متن کامل

Hidden worlds of the early knowledge economy: libraries in British companies before the middle of the 20th century

The ‘knowledge economy’ is a key facet of the proposition that Western societies have entered a fundamentally new phase of development as a result of rapid advances in digital technology. However, this proposition underplays the historic importance of knowledge to economic activity and performance. One example of the past intersection of knowledge transfer and economic development is the in-hou...

متن کامل

Imagining the Northwest : A Digital Library Partnership in Oregon

This paper documents the development of a digital library of still images created by photographer Lee Moorhouse on the Umatilla Indian Reservation at the turn of the 20th century. The University of Oregon Libraries, working with the Tamastslikt Cultural Institute of the Confederated Tribes of the Umatilla, developed a Dublin Core compliant metadata structure. The metadata structure accommodates...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006